Rows: 25,480
Columns: 26
$ 發病日 <date> 2023-07-31, 2023-07-31, 2023-07-31, 2023-07-31, 20…
$ 個案研判日 <chr> "2023/07/31", "2023/07/31", "2023/07/31", "2023/07/…
$ 通報日 <date> 2023-07-31, 2023-07-31, 2023-07-31, 2023-07-31, 20…
$ 性別 <chr> "女", "男", "男", "男", "男", "男", "女", "女", "男…
$ 年齡層 <chr> "30-34", "55-59", "5-9", "70+", "55-59", "30-34", "…
$ 居住縣市 <chr> "台南市", "台南市", "台南市", "台南市", "台南市", "…
$ 居住鄉鎮 <chr> "永康區", "東區", "永康區", "仁德區", "永康區", "古…
$ 居住村里 <chr> "埔園里", "大智里", "五王里", "成功里", "中興里", "…
$ 最小統計區 <chr> "A6731-0201-00", "A6732-1040-00", "A6731-0574-00", …
$ 最小統計區中心點X <chr> "120.253752333", "120.232374917", "120.235733496", …
$ 最小統計區中心點Y <chr> "23.031699814", "22.962366283", "23.013083716", "22…
$ 一級統計區 <chr> "A6731-16-006", "A6732-64-001", "A6731-42-008", "A6…
$ 二級統計區 <chr> "A6731-16", "A6732-64", "A6731-42", "A6727-10", "A6…
$ 感染縣市 <chr> "台南市", "台南市", "台南市", "台南市", "台南市", "…
$ 感染鄉鎮 <chr> "永康區", "東區", "永康區", "仁德區", "永康區", "古…
$ 感染村里 <chr> "None", "None", "None", "None", "None", "None", "No…
$ 是否境外移入 <chr> "否", "否", "否", "否", "否", "否", "否", "否", "否…
$ 感染國家 <chr> "中華民國", "中華民國", "中華民國", "中華民國", "中…
$ 確定病例數 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ 居住村里代碼 <chr> "6703100-004", "6703200-003", "6703100-001", "67027…
$ 感染村里代碼 <chr> "None", "None", "None", "None", "None", "None", "No…
$ 血清型 <chr> "None", "None", "None", "None", "第一型", "第一型",…
$ 內政部居住縣市代碼 <chr> "67", "67", "67", "67", "67", "10009", "67", "67", …
$ 內政部居住鄉鎮代碼 <chr> "6703100", "6703200", "6703100", "6702700", "670310…
$ 內政部感染縣市代碼 <chr> "67", "67", "67", "67", "67", "10009", "67", "67", …
$ 內政部感染鄉鎮代碼 <chr> "6703100", "6703200", "6703100", "6702700", "670310…
Take-home Exercise 2: Application of Spatial and Spatio-temporal Analysis Methods to Discover the Distribution of Dengue Fever in Tainan City, Taiwan
Overview
Setting The Scene
Dengue Hemorrhagic Fever (in short dengue fever) is one of the most widespread mosquito-borne diseases in the most tropical and subtropical regions. It is an acute disease caused by dengue virus infection which is transmitted by female Aedes aegypti and Aedes albopictus mosquitoes. In 2015, Taiwan had recorded the most severe dengue fever outbreak with more than 43,000 dengue cases and 228 deaths. Since then, the annual reported dengue fever cases were maintained at the level of not more than 200 cases. However, in 2023, Taiwan recorded 26703 dengue fever cases. More than 25,000 cases were reported at Tainan City, and more than 80% of the reported dengue fever cases occurred in the month August-November 2023 and epidemiology week 31-50.
Objectives
As a curious geospatial analytics green horn, you are interested to discover:
- if the distribution of dengue fever outbreak at Tainan City, Taiwan are independent from space and space and time.
- If the outbreak is indeed spatial and spatio-temporal dependent, then, you would like to detect where are the clusters and outliers, and the emerging hot spot/cold spot areas.
The Task The specific tasks of this take-home exercise are as follows:
- Using appropriate function of sf and tidyverse, preparing the following geospatial data layer:
- a study area layer in sf polygon features. It must be at village level and confined to the D01, D02, D04, D06, D07, D08, D32 and D39 counties of Tainan City, Taiwan.
- a dengue fever layer within the study area in sf point features. The dengue fever cases should be confined to epidemiology week 31-50, 2023.
- a derived dengue fever layer in spacetime s3 class of sfdep. It should contain, among many other useful information, a data field showing number of dengue fever cases by village and by epidemiology week.
- Using the extracted data, perform global spatial autocorrelation analysis by using sfdep methods.
- Using the extracted data, perform local spatial autocorrelation analysis by using sfdep methods.
- Using the extracted data, perform emerging hotspot analysis by using sfdep methods.
- Describe the spatial patterns revealed by the analysis above.
The Data
For the purpose of this take-home exercise, two data sets are provided, they are:
TAIWAN_VILLAGE_2020, a geospatial data of village boundary of Taiwan. It is in ESRI shapefile format. The data is in Taiwan Geographic Coordinate System. (Source: Historical map data of the village boundary: TWD97 longitude and latitude)
Dengue_Daily.csv, an aspatial data of reported dengue cases in Taiwan since 1998. (Source: Dengue Daily Confirmed Cases Since 1998. Below are selected fields that are useful for this study:
- 發病日: Onset date
- 最小統計區中心點X: x-coordinate
- 最小統計區中心點Y: y-coordinate Both data sets have been uploaded on eLearn. Students are required to download them from eLearn.
Getting Started
Loading Packages
We can use this code chunk to load the required packages
Loading Data and Data Wrangling
Aspatial Data
We can load the dengue_daily data with the code chunk below. Since we are only stuyding cases from week 31-50, we can use filter() to get the dates from 31 July to 17 December, which is week 31-50
In order to save up on computational resources and make it more readable, we will only take up the three fields mentioned above and rename it to English
Rows: 25,480
Columns: 3
$ OnsetDate <date> 2023-07-31, 2023-07-31, 2023-07-31, 2023-07-31, 2023-07-…
$ X_Coordinate <chr> "120.253752333", "120.232374917", "120.235733496", "120.2…
$ Y_Coordinate <chr> "23.031699814", "22.962366283", "23.013083716", "22.95747…
Since the coordinates are still in chr format, we need to convert it to numeric first
OnsetDate X_Coordinate Y_Coordinate
Min. :2023-07-31 Min. :118.3 Min. :21.99
1st Qu.:2023-09-11 1st Qu.:120.2 1st Qu.:22.97
Median :2023-10-01 Median :120.2 Median :22.99
Mean :2023-10-03 Mean :120.3 Mean :23.01
3rd Qu.:2023-10-26 3rd Qu.:120.3 3rd Qu.:23.02
Max. :2023-12-17 Max. :121.9 Max. :25.20
NA's :14 NA's :14
Since the data still have some missing values, let’s clean that up first
OnsetDate X_Coordinate Y_Coordinate
Min. :2023-07-31 Min. :118.3 Min. :21.99
1st Qu.:2023-09-11 1st Qu.:120.2 1st Qu.:22.97
Median :2023-10-01 Median :120.2 Median :22.99
Mean :2023-10-03 Mean :120.3 Mean :23.01
3rd Qu.:2023-10-26 3rd Qu.:120.3 3rd Qu.:23.02
Max. :2023-12-17 Max. :121.9 Max. :25.20
After the data is clean, we can convert it into sf. Remember to convert the crs to TWD97 (crs=3824)
Rows: 25,466
Columns: 2
$ OnsetDate <date> 2023-07-31, 2023-07-31, 2023-07-31, 2023-07-31, 2023-07-31,…
$ geometry <POINT [°]> POINT (120.2538 23.0317), POINT (120.2324 22.96237), P…
Geospatial Data
We can use st_read() to load the geospatial data, with an additional filter to get only the counties mentioned above. We can plot it to get a better understanding of the data
Reading layer `TAINAN_VILLAGE' from data source
`C:\Users\yozaf\SMUY3S2\Geospatial\IS415-GAA\Take-home_Ex\Take-home_Ex02\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 649 features and 10 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 120.0269 ymin: 22.88751 xmax: 120.6563 ymax: 23.41374
Geodetic CRS: TWD97
Rows: 258
Columns: 11
$ VILLCODE <chr> "67000350032", "67000270011", "67000370005", "67000330004",…
$ COUNTYNAME <chr> "臺南市", "臺南市", "臺南市", "臺南市", "臺南市", "臺南市",…
$ TOWNNAME <chr> "安南區", "仁德區", "中西區", "南區", "安南區", "安南區", "…
$ VILLNAME <chr> "青草里", "保安里", "赤嵌里", "大成里", "城北里", "城南里",…
$ VILLENG <chr> "Qingcao Vil.", "Bao'an Vil.", "Chihkan Vil.", "Dacheng Vil…
$ COUNTYID <chr> "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D",…
$ COUNTYCODE <chr> "67000", "67000", "67000", "67000", "67000", "67000", "6700…
$ TOWNID <chr> "D06", "D32", "D08", "D02", "D06", "D06", "D08", "D06", "D0…
$ TOWNCODE <chr> "67000350", "67000270", "67000370", "67000330", "67000350",…
$ NOTE <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ geometry <POLYGON [°]> POLYGON ((120.1176 23.08387..., POLYGON ((120.2304 …

Data Wrangling
Now we join the two data with st_join. Here, we use EpiWeek to group together cases in the same week. Don’t forget to handle missing values as well
OnsetDate geometry VILLCODE
Min. :2023-07-31 POINT :18800 Length:18800
1st Qu.:2023-09-09 epsg:3824 : 0 Class :character
Median :2023-09-27 +proj=long...: 0 Mode :character
Mean :2023-09-28
3rd Qu.:2023-10-17
Max. :2023-12-17
COUNTYNAME TOWNNAME VILLNAME VILLENG
Length:18800 Length:18800 Length:18800 Length:18800
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
COUNTYID COUNTYCODE TOWNID TOWNCODE
Length:18800 Length:18800 Length:18800 Length:18800
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
EpiWeek
Min. :31.00
1st Qu.:36.00
Median :39.00
Mean :39.06
3rd Qu.:42.00
Max. :50.00
After joining, we can make a new sf data to store the summary of how many cases appear in a specific village in a specific week.
VILLCODE EpiWeek NumberOfCases geometry
Length:3514 Min. :31.00 Min. : 1.00 MULTIPOINT :2523
Class :character 1st Qu.:36.00 1st Qu.: 1.00 POINT : 991
Mode :character Median :40.00 Median : 3.00 epsg:3824 : 0
Mean :39.87 Mean : 5.35 +proj=long...: 0
3rd Qu.:44.00 3rd Qu.: 7.00
Max. :50.00 Max. :56.00
Now, to get the geometry of the villages, we can combine it with the tainan dataset. Before combining, we need to convert dengue_summary to df by dropping the geometry column. But, note that this df data does not contain rows where there are no case at all in a particular village in a particular week. To fix that, we can use complete() with the parameters in the chunk below. What it does is that it will automatically add a row for every combination of VILLCODE and EpiWeek possible, and if there are no cases for that particular combination, the fill argument will add a 0 instead of NA in the NumberOfCases column
Then, we can use left_join() to join the datasets based on the same “VILLCODE”. We can use select() to retrieve only the columns we need. Remember to turn the joined data back to sf
Let’s check the CRS of the new sf to make sure that its in the same reference system
Coordinate Reference System:
User input: TWD97
wkt:
GEOGCRS["TWD97",
DATUM["Taiwan Datum 1997",
ELLIPSOID["GRS 1980",6378137,298.257222101,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
USAGE[
SCOPE["Horizontal component of 3D system."],
AREA["Taiwan, Republic of China - onshore and offshore - Taiwan Island, Penghu (Pescadores) Islands."],
BBOX[17.36,114.32,26.96,123.61]],
ID["EPSG",3824]]
Creating Spatiotemporal Layer
To create a spatiotemporal layer, we can use as_spacetime() for the dengue_summary_sf, with VILLCODE for the spatial column and EpiWeek to represent the temporal column
To see the data, we can use activate() with ‘data’ for the second argument
# A tibble: 5,140 × 3
VILLCODE EpiWeek NumberOfCases
* <chr> <dbl> <int>
1 67000270001 31 0
2 67000270001 32 0
3 67000270001 33 1
4 67000270001 34 1
5 67000270001 35 2
6 67000270001 36 3
7 67000270001 37 5
8 67000270001 38 4
9 67000270001 39 3
10 67000270001 40 2
# ℹ 5,130 more rows
On the other hand, the geometry values can be seen with activate() also, only with ‘geometry’ for the second argument
Simple feature collection with 257 features and 1 field
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 120.0627 ymin: 22.89401 xmax: 120.2925 ymax: 23.09144
Geodetic CRS: TWD97
# A tibble: 257 × 2
VILLCODE geometry
* <chr> <POLYGON [°]>
1 67000270001 ((120.2672 22.99653, 120.2673 22.99616, 120.2675 22.9958, 120.26…
2 67000270002 ((120.266 22.99104, 120.266 22.99096, 120.2659 22.99083, 120.265…
3 67000270003 ((120.2492 22.98265, 120.2497 22.9826, 120.2504 22.98251, 120.25…
4 67000270004 ((120.2391 22.98008, 120.2393 22.98001, 120.2394 22.97996, 120.2…
5 67000270005 ((120.2578 22.97427, 120.2584 22.97398, 120.2591 22.97356, 120.2…
6 67000270006 ((120.2713 22.96793, 120.2712 22.96777, 120.2712 22.96766, 120.2…
7 67000270007 ((120.2408 22.959, 120.2422 22.95846, 120.2435 22.95791, 120.244…
8 67000270008 ((120.2701 22.94837, 120.2701 22.94824, 120.27 22.94819, 120.27 …
9 67000270011 ((120.2304 22.93544, 120.2306 22.93519, 120.2319 22.93271, 120.2…
10 67000270012 ((120.2251 22.96159, 120.2256 22.96149, 120.2261 22.9614, 120.22…
# ℹ 247 more rows
Before using the spacetime layer in further computations, we can check if it is a proper cube
[1] TRUE
Visualizing Data
Now, we are plotting a choropleth to see the distribution of cases using tmap